Over the past several years, social scientists have noted that the discourse of "diversity" is expanding in use across various organizations (Berrey 2015; Ray 2019). While these authors speak to the discourse of diversity being a mechanism for domininant (white) racial groups in power to pay lip service to racial inequity without addressing these problems through the reallocation of resources, our interest in diversity is to better understand how it is used in biomedical research. Specifically, we are interested in eventually understand how diversity may or may not be replacing the discourse of race and ethinicity. First, we need to establish how often diversity is used within biomedical research. On this page, we are interested in measuring how often diversity and its various metonyms (i.e. related phrases) are used in biomedical research over the past three decades (1990-2020). Before getting started, we posed a working hypothesis to guide our work:
- The use of the term 'diversity' and its related terminology has increased in biomedical abstracts from 1990-2020. To test Hypothesis 1, we opted to use a supervised approach to text mining that depends on the use of a "nested dictionary." This basically means that we constructed several subdictionaries of terms that can be classified into 11 different categories of "diversity," including cultural, disability, diversity, equity/justice, lifecourse, migration, minority, race/ethnicity, sex/gender, and sexuality. Within each of these categories, there are 4-104 different terms that are of interest to us. For a full list of terms in each category, see Supplementary Tables 1A and 2A below or our Methods page.
Once these dictionaries were aggregated, we developed a strategy that (1) counted how often each of the terms within each category were mentioned in biomedical abstracts each year and (2) then calculated what percentage of the overall abstracts that the terms arose within to control for the overall rise in publications over time. Below, we provide a simple description of the variations in term usage while our manuscript provides a more detailed analysis of the implications for our broader arguments.
Raw Growth in Biomedical Abstracts
First, we analyzed the
Figure 1A shows the growth in the raw word frequencies of diversity-related terms from 1990-2017. This plot shows that the term diversity has consistently grown over time (from only 2 mentions in 1992 to 381 in 2017). This trend offers preliminary support for H1, but is only one small part of how diversity-related terms are channging in use. Overall, most of the terms in this graph have increased, but we see the most notable growth in terms related to sex/gender and aging research. While this growth is in part because scientists now have a larger dictionary of terms to describe sexed/gendered and aging populations, the focus on these topics is not simply a function of a more comprehensive vocabulary. Supplementary Table 1B provides the totals of top terms used in the aging, class, sex/gender, and sexualities categories, showing that, for example, the use of "women" would have the fifth highest total in the final year of this plot. Figure 1A also shows that terms like population, genetic, and cultural have also risen notably over time while terms associated with race/ethnicity, minority, class, sexuality, social class, and ancestry have all grown at a slower pace than the set of diversity terms.
Proportional Growth in Biomedical Abstracts
To normalize for the growth of overall publications, we look at change in the proportion of available abstracts.
While the raw totals suggested that diversity-related terms are rising, the proportions outlined in Figure 1B suggest that the sets of diversity-related terms has changed relatively little over time. For example, the terms associated with aging, ancestry, minority, sex/gender, sexualities, social class, and race/gender have stayed relatively stable over time. Perhaps the most interesting trend is the sex/gender line, which rises from 8% to 12% of articles from 1990 to 1996 before returning to around ~8% in 2005. In contrast, the terms associated with population, genetic, class, and diversity have all steadily increased from their baseline in 1990.
Main Takeaways
Overall, our results provide support for the notion that the use of the term "diversity" is increasing in the biomedical abstracts, but when taken on whole it seems that this growth is quite modest. The raw term frequencies suggest that diversity-related terminology has grown dramatically over time, but the proportional analyses temper these findings by showing that some trends, like sex/gender and race/ethnicity, have actually declined in recent years. Future work will need to examine what social, political and economic factors may have contributed to these declines.
Appendices
Here is a list of the terms in each category analyzed above. You can scroll through each category or use the search tool to see if a term of interest was used in the analyses.